Conversation
7c90d38 to
80ac2b9
Compare
80ac2b9 to
c81a8af
Compare
|
What's the worst-case scenario for the efficiency hit due to job duplication? If this ends up meaningfully affecting runtime, we should understand the extent. |
This is a catch-all to resolve any lost actions that prevent the stf from continuing execution. Equivalent to periodically restarting the mosaic node.
c81a8af to
2b77426
Compare
Zk2u
left a comment
There was a problem hiding this comment.
If we are going to do this, we should clear the job queue then restore into it. This needs to ensure that the clearing operation is done properly, doesn't have race conditions or violate any of the other guarantees the job scheduler makes to the other components
|
Blocking issue in This PR still rebuilds So as written, this PR reintroduces the same completion-loss class we traced for the This is on the assumption this is actually a good idea, which i'm not sure it is and we should discuss on slack first |
Description
Run stf::restore() periodically on all state machines based on configured time interval.
This is a catch-all to resolve any lost actions that prevent the stf from continuing execution.
Equivalent to periodically restarting the mosaic node.
Type of Change
Notes to Reviewers
Job scheduler does not currently dedupe actions. With periodic restores, this can potentially cause duplicate job runs, although it will not impact the correctness of the process, just some inefficiency.
Checklist
Related Issues